Optimal unbiased estimation for expected cumulative discounted cost
نویسندگان
چکیده
منابع مشابه
Optimal Expected Discounted Reward of a Wireless Network with Award and Cost
In this paper, we extend our previous optimization investigation on a single cell (IEEE Transactions on Wireless Communications, vol. 8, no. 2, pp. 1038-1044, 2009) to a whole network with multiple cells and further consider the call admission control (CAC) based on the total expected discounted reward. Here, the system will get the award for admitting a call, but will incur a cost for rejectin...
متن کاملTotal Expected Discounted Reward MDPs: Existence of Optimal Policies
This article describes the results on the existence of optimal and nearly optimal policies for Markov Decision Processes (MDPs) with total expected discounted rewards. The problem of optimization of total expected discounted rewards for MDPs is also known under the name of discounted dynamic programming.
متن کاملCumulative Risk Estimation for Chemical Mixtures
In reality, humans are always exposed to a combination of toxic substances and seldom to a single agent. Simultaneous exposure to a multitude of chemicals could result in unexpected consequences. The combined risk may lead to greater or less than a simple summation of the effects induced by chemicals given individually. Here, a method is proposed for estimating the cumulative risk which is the ...
متن کاملOptimal controller/observer gains of discounted-cost LQG systems
The linear-quadratic-Gaussian control paradigm is well-known in literature. The strategy of minimizing the cost function is available, both for the case where the state is fully known and where it is estimated through an observer. The situation is different when the cost function has an exponential discount factor, also known as a prescribed degree of stability. In this case, the optimal contro...
متن کاملNear-optimal PAC bounds for discounted MDPs
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (MDPs). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (UCRL) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends lin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: European Journal of Operational Research
سال: 2020
ISSN: 0377-2217
DOI: 10.1016/j.ejor.2020.03.072